48 research outputs found

    Stream Aggregation Through Order Sampling

    Full text link
    This is paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and e cient method for weighted sampling from a stream of uniquely keyed items, there is no current algorithm that realizes the benefits of order sampling in the context of stream aggregation over non-unique keys. A naive approach to order sample regardless of key then aggregate the results is hopelessly inefficient. In distinction, our proposed algorithm uses a single persistent random variable across the lifetime of each key in the cache, and maintains unbiased estimates of the key aggregates that can be queried at any point in the stream. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. This approach represents a considerable reduction in computational complexity compared with the state of the art in adapting Sample and Hold to operate with a fixed cache size. Concerning statistical properties, we prove that PBA provides unbiased estimates of the true aggregates. We analyze the computational complexity of PBA and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queriesComment: 10 page

    Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

    Get PDF
    As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers. However, speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. In this work, we present Hopper, a job scheduler that is speculation-aware, i.e., that integrates the tradeoffs associated with speculation into job scheduling decisions. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation

    The Dark Menace: Characterizing Network-based Attacks in the Cloud

    Get PDF
    ABSTRACT As the cloud computing market continues to grow, the cloud platform is becoming an attractive target for attackers to disrupt services and steal data, and to compromise resources to launch attacks. In this paper, using three months of NetFlow data in 2013 from a large cloud provider, we present the first large-scale characterization of inbound attacks towards the cloud and outbound attacks from the cloud. We investigate nine types of attacks ranging from network-level attacks such as DDoS to application-level attacks such as SQL injection and spam. Our analysis covers the complexity, intensity, duration, and distribution of these attacks, highlighting the key challenges in defending against attacks in the cloud. By characterizing the diversity of cloud attacks, we aim to motivate the research community towards developing future security solutions for cloud systems

    Systematic Analysis of Survival-Associated Alternative Splicing Signatures in Thyroid Carcinoma

    Get PDF
    Alternative splicing (AS) is a key mechanism involved in regulating gene expression and is closely related to tumorigenesis. The incidence of thyroid cancer (THCA) has increased during the past decade, and the role of AS in THCA is still unclear. Here, we used TCGA and to generate AS maps in patients with THCA. Univariate analysis revealed 825 AS events related to the survival of THCA. Five prognostic models of AA, AD, AT, ES, and ME events were obtained through lasso and multivariate analyses, and the final prediction model was established by integrating all the AS events in the five prediction models. Kaplan–Meier survival analysis revealed that the overall survival rate of patients in the high-risk group was significantly shorter than that of patients in the low-risk group. The ROC results revealed that the prognostic capabilities of each model at 3, 5, and 8 years were all greater than 0.7, and the final prognostic capabilities of the models were all greater than 0.9. By reviewing other databases and utilizing qPCR, we verified the established THCA gene model. In addition, gene set enrichment analysis showed that abnormal AS events might play key roles in tumor development and progression of THCA by participating in changes in molecular structure, homeostasis of the cell environment and in cell energy. Finally, a splicing correlation network was established to reveal the potential regulatory patterns between the predicted splicing factors and AS event candidates. In summary, AS should be considered an important prognostic indicator of THCA. Our results will help to elucidate the underlying mechanism of AS in the process of THCA tumorigenesis and broaden the prognostic and clinical application of molecular targeted therapy for THCA

    Path-dependent selection—a bridge between natural selection and neutral selection

    Get PDF
    Path-dependent selection follows the premise of complete symmetry in the neutral theory of selection; mutations in the natural world are entirely based on statistical randomness, lack directionality, and thus do not exhibit differences in fitness. Under specific spatiotemporal conditions, however, evolutionary positive feedback effects resulting from the specific environment will result in the breakdown of symmetry pre-assumed in neutral selection. This evolutionary positive feedback, a recursive effect, is of Lamarckian active selection or inheritance of acquired characteristics. The mutual antagonistic interactions between the positive selection of recursive effect and the passive selection under natural selection pressure of the environment in multidimensional conditions will result in evolutionary paths. Path-dependent selection proposes that the evolutionary process of organisms is a selection process based on path frequencies rather than an increase in fitness, with a strong reliance on the paths that it has taken in the past. Because of the existence of transition probabilities between different paths or within the same path (such as plasmid transfer, transposons, and function transfer in ecological interactions), path formation will exhibit acceleration or deceleration effects, explaining Gould’s principles such as punctuated equilibrium. When environmental selection pressure is weak or zero, most or all paths (like neutral selection outcomes) may be possible. The frequencies of different paths will differentiate as environmental selection increases, and the paths with higher frequencies will be more easily selected. When the evolutionary process or history has no impact on the evolution of the paths themselves (a static, equilibrium state), the path with the highest frequency is the shortest or optimal path used by evolution—a result consistent with Darwin’s theory of natural selection. Path-dependent selection, which draws inspiration from modern physics, particularly path integral methods in quantum mechanics, may provide us with a new perspective and approach to explaining the evolution of life

    Securing Network Function Virtualization

    No full text
    Presented on November 22, 2019 at 12:00 p.m. in the Klaus Advanced Computing Building, Room 1116.Minlan Yu is an associate professor in the School of Engineering and Applied Sciences at Harvard University She is interested in data networking, distributed systems, enterprise and data center networks, network virtualization, and software-defined networking.Runtime: 27:21 minute
    corecore